NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

History-driven test program synthesis for JVM testing

https://doi.org/10.1145/3510003.3510059

Zhao, Yingquan; Wang, Zan; Chen, Junjie; Liu, Mengdi; Wu, Mingyuan; Zhang, Yuqun; Zhang, Lingming (May 2022, International Conference on Software Engineering)

Full Text Available
Practical Accuracy Estimation for Efficient Deep Neural Network Testing

https://doi.org/10.1145/3394112

Chen, Junjie; Wu, Zhuo; Wang, Zan; You, Hanmo; Zhang, Lingming; Yan, Ming (October 2020, ACM Transactions on Software Engineering and Methodology)
null (Ed.)
Deep neural network (DNN) has become increasingly popular and DNN testing is very critical to guarantee the correctness of DNN, i.e., the accuracy of DNN in this work. However, DNN testing suffers from a serious efficiency problem, i.e., it is costly to label each test input to know the DNN accuracy for the testing set, since labeling each test input involves multiple persons (even with domain-specific knowledge) in a manual way and the testing set is large-scale. To relieve this problem, we propose a novel and practical approach, called PACE (which is short for P ractical AC curacy E stimation), which selects a small set of test inputs that can precisely estimate the accuracy of the whole testing set. In this way, the labeling costs can be largely reduced by just labeling this small set of selected test inputs. Besides achieving a precise accuracy estimation, to make PACE more practical it is also required that it is interpretable, deterministic, and as efficient as possible. Therefore, PACE first incorporates clustering to interpretably divide test inputs with different testing capabilities (i.e., testing different functionalities of a DNN model) into different groups. Then, PACE utilizes the MMD-critic algorithm, a state-of-the-art example-based explanation algorithm, to select prototypes (i.e., the most representative test inputs) from each group, according to the group sizes, which can reduce the impact of noise due to clustering. Meanwhile, PACE also borrows the idea of adaptive random testing to select test inputs from the minority space (i.e., the test inputs that are not clustered into any group) to achieve great diversity under the required number of test inputs. The two parallel selection processes (i.e., selection from both groups and the minority space) compose the final small set of selected test inputs. We conducted an extensive study to evaluate the performance of PACE based on a comprehensive benchmark (i.e., 24 pairs of DNN models and testing sets) by considering different types of models (i.e., classification and regression models, high-accuracy and low-accuracy models, and CNN and RNN models) and different types of test inputs (i.e., original, mutated, and automatically generated test inputs). The results demonstrate that PACE is able to precisely estimate the accuracy of the whole testing set with only 1.181%∼2.302% deviations, on average, significantly outperforming the state-of-the-art approaches.
more » « less
Full Text Available
Exposing numerical bugs in deep learning via gradient back-propagation

https://doi.org/10.1145/3468264.3468612

Yan, Ming; Chen, Junjie; Zhang, Xiangyu; Tan, Lin; Wang, Gan; Wang, Zan (August 2021, Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

Full Text Available
TECCD: A Tree Embedding Approach for Code Clone Detection

https://doi.org/10.1109/ICSME.2019.00025

Gao, Yi; Wang, Zan; Liu, Shuang; Yang, Lin; Sang, Wei; Cai, Yuanfang (September 2019, IEEE International Conference on Software Maintenance and Evolution (ICSME))

Clone detection techniques have been explored for decades. Recently, deep learning techniques has been adopted to improve the code representation capability, and improve the state-of-the-art in code clone detection. These approaches usually require a transformation from AST to binary tree to incorporate syntactical information, which introduces overheads. Moreover, these approaches conduct term-embedding, which requires large training datasets. In this paper, we introduce a tree embedding technique to conduct clone detection. Our approach first conducts tree embedding to obtain a node vector for each intermediate node in the AST, which captures the structure information of ASTs. Then we compose a tree vector from its involving node vectors using a lightweight method. Lastly Euclidean distances between tree vectors are measured to determine code clones. We implement our approach in a tool called TECCD and conduct an evaluation using the BigCloneBench (BCB) and 7 other large scale Java projects. The results show that our approach achieves good accuracy and recall and outperforms existing approaches.
more » « less
Full Text Available

Search for: All records